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Abstract 

In this paper we define a discrete dynamical system that governs the evolution of a population of agents. From the 
dynamical system, a variant of Differential Evolution is derived. It is then demonstrated that, under some assumptions 
on the differential mutation strategy and on the local structure of the objective function, the proposed dynamical system 
has fixed points towards which it converges with probability one for an infinite number of generations. This property 
is used to derive an algorithm that performs better than standard Differential Evolution on some space trajectory 
optimization problems. The novel algorithm is then extended with a guided restart procedure that further increases 
the performance, reducing the probability of stagnation in deceptive local minima. 

Index Terms 

Differential Evolution, Global Trajectory Optimization. 

I. Introduction 

Some evolutionary heuristics can be interpreted as discrete dynamical systems governing the movements of a set 
of agents (or particles) in the search space. This is well known for Particle Swarm Optimization (PSO) where the 
variation of the velocity of each particle in the swarm is defined by a control term made of a social component 
and a individual (or cognitive) component [l]-[4]. The social component can be interpreted as a behavior dictated 
by the knowledge acquired by the whole swarm of particles, while the cognitive component can be interpreted as 
a behavior dictated by the knowledge acquired by each individual particle. 

The same principle can be generalized and extended to other evolutionary heuristics such as Differential Evolution 
(DE) [5]. The analysis of a discrete dynamical system governed by the heuristics generating the behavior of 
individuals in DE, and in Evolutionary Computation in general, would allow for a number of considerations on the 
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evolution of the search process and therefore on the convergence properties of a global optimization algorithm. In 
particular, four outcomes of the search process are possible: 

• Divergence to infinity. In this case the discrete dynamical system is unstable, and the global optimization 
algorithm is not convergent. 

• Convergence to a fixed point in D. In this case the global optimization algorithm is simply convergent in D and 
we can define a stopping criterion. Once the search is stopped we can define a restart procedure. Depending 
on the convergence profile, the use of a restart procedure can be more or less efficient. 

• Convergence to a limit cycle in which the same points in D are re-sampled periodically. Even in this case we 
can define a stopping criterion and a restart procedure. 

• Convergence to a strange (chaotic) attractor. In this case a stopping criterion cannot be clearly defined because 
different points are sampled at different iterations. 

All outcomes are generally interesting to understand the evolution process. Identifying under which conditions 
divergence occurs is important to properly design an algorithm, or to define the appropriate setting, in particular in 
the case of automatic adaptation of some key parameters. Divergence can be seen as the opposite of the intensification 
process and can increase diversity and exploration. The convergence to a chaotic attractor can represent an interesting 
case of dense random sampling of an extended region. When this happens, the algorithm repeatedly samples the 
same region but never re-samples the same points. This mechanism could be useful to reconstruct an extended set 
with a specific property (e.g. f(x) < e, with / the objective function and e a threshold). 

The main interest, in this paper, is in the most commonly desired outcome: the convergence to a fixed point (see for 
example [6], in the context of global root finding, for methods to eliminate undesirable behaviors). The convergence 
to a fixed point, even different from an optimum, can be used to induce a restart of the search process with minimum 
waste of resources (as it will be demonstrated in this paper). 

The analysis of the dynamical properties of the dynamical system associated with a particular heuristic can give some 
insights into the balance between global and local exploration and the volume of the search space that is covered 
during the search. Extensive work on the dynamics of Genetic Algorithms and general Evolutionary Algorithms 
can be found in the studies of Priigel-Bennett et al. [7], [8] and Beyer [9]. Non-evolutionary examples of the use 
of dynamical system theory to derive effective global optimization schemes can be found in the works of Sertl and 
Dellnitz [10]. 

In this paper we propose a discrete dynamical system, or discrete map, governing the evolution of a population of 
individuals, being each individual a point in a d-dimensional domain D. From the discrete dynamical system we 
derive a variant of Differential Evolution and we study its convergence properties. In particular, it is proven that, 
under some assumptions, the dynamics of the proposed variant of DE converges to a fixed point or to a level set. 
Note that, the proofs proposed in this paper considers the whole d-dimensional discrete dynamical system associated 
to the evolution of the population. 

The theoretical results are then used to derive a novel algorithm that performs better than DE strategies DE/rand/l/bin 
and DE/best/l/bin [11] on some difficult space trajectory design problems. The novel algorithm is based on a 
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hybridization of the proposed DE variant and the logic behind Monotonic Basin Hopping (MBH) [12]. 
The paper is organized as follows: after introducing the dynamics of a population of agents, three convergence 
theorems are demonstrated. Then an inflationary DE algorithm based on a hybridization between a variant of DE 
and MBH is derived. A description of the test cases and testing procedure follows. After presenting the results of 
the tests, the search space is analyzed to derive some considerations on the general applicability of the results. 

II. Agent Dynamics 

In this section we start by defining a generic discrete dynamical system governing the motion of an agent in a 
generic search space D. From this general description we derive a variant of what Storn and Price defined as the 
basic DE strategy [11], 

If we consider that a candidate solution vector in a generic d-dimensional box, 

D = [6,(1) Ml)] x [6,(2) b u (2)] x ...[kid) b u (d)} (1) 

is associated to an agent, then the heuristic governing its motion in D can be written as the following discrete 
dynamical system: 

Vi,fe+1 = (1 - c)Vi,fc + u t . k 

x.;,fc+i = x^/j + vSfc^ + v^k+itXijkjVhk+i 

with 

v = mm([u moa .,u i)fc+1 ])/« j)fc+1 (3) 

The function Sfak + Vj k+i, x^) is a selection operator that can be either 1 if the candidate point x^fe + v^fc+i 

is accepted or if it is not accepted. Different evolutionary algorithms have different ways to define S. 

The control u^fc defines the next point that will be sampled for each one of the existing points in the solution 

space, the vectors x^/. and v$ /. define the current state of the agent in the solution space at stage k of the search 

process, x^jt+i and v$ k+i define the state of the agent in the solution space at stage k + 1 of the search process 

and c is a viscosity, or dissipative coefficient, for the process. Eq. OJ represents a limit sphere around the agent 

Xj,fc at stage k of the search process. Different evolutionary algorithms have different ways to define Uj,fc, v and c 

(see for example the dynamics of PSO [1]). In this paper we will focus on the way 11$,*, v and c are defined in 

Differential Evolution. 

Now consider the u,-,* defined by: 

u*,* = e [(x l3jfe - x iifc ) + F(x i2yk - x n , fe )] (4) 

where i\, ii and 13 are integer numbers randomly chosen in the interval [1, n pop ] C N of indexes of the population, 
and e is a mask containing a random number of and 1 according to: 

f 1 => r,- < C R 

e(j) = \ °~ (5) 
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with j randomly chosen in the interval [1 d], so that e contains at least one nonzero component, d the dimension of 
the solution vector, rj a random number taken from a random uniform distribution rj 6 C/[0, 1] and Cr a constant. 
The product between e and [(xj 3 .fc — x^) + F(x J2 .fc — x^.^)] in Eq. has to be intended component-wise. The 
index i% can be chosen at random (this option will be called exploration strategy in the remainder of this paper) or 
can be the index of the best solution vector Xb es t (this option will be called convergence strategy in the remainder 
of this paper). Selecting the best solution vector or a random one changes significantly the convergence speed of the 
algorithm. The selection function S can be either 1 or depending on the relative value of the objective function 
of the new candidate individual generated with Eq. with respect to the one of the current individual. 
In other words the selection function S can be expressed as: 

c , s f 1 if /( x a + Ui.fe) < /(xi.fc) 

<S(x ijfc + u it k,Xi,k) = | (6) 
otherwise 

Note that in this paper we consider minimization problems in which the lowest value of / is sought. 
If one takes c = 1, v = 1 and v max = +oo, then map (f2]i reduces to: 

x i,fc+i = x i,fe + S(x.i^k + Ut,fc, Xj^u^fc (7) 

In the general case the indices i\, i% and 13 can assume any value. However, if the three indexes i\, 12 and 13 are 
restricted to be mutually different, map ([7), with m t k defined in ©, e defined in <(5j and S defined in is the DE 
basic strategy DE/rand/l/bin defined by Storn and Price in [11]. If 13 is taken as the index of the best individual 
ibest, and i\, 12 and 13 are mutually different, then one can obtain the DE strategy DE/best/l/bin. In fact, if © is 
substituted in (0 one gets: 

x iifc+ i = (1 - S'e)x i!fc + Se [(x i3 , fe + F(x i2:k - x iljfe )] (8) 

Now, if the selection function does not accept the candidate point x,.fe + u^., then S = 0, therefore (|8) reduces to 
Xi,fc+i = Xi fc (i-e. the state of the agent remains unchanged). If the candidate point is accepted instead, then the 
new location of the agent is: 

Xi,fe+i = (1 - e)x l>fc + e [(Xj 3ifc + F(yi l2M - x Uifc )] (9) 

The quantity in square brackets is the basic DE strategy DE/rand/l/bin or the variant DE/best/l/bin, respectively 
for 13 random or 13 = ibest- The mask e together with (1 — e)xi ! fc represent the cross-over operator in the basic 
DE strategy [11], with 1 a vector of l's and the products lxf * and ex; & that are both component- wise. In fact, 
Xj,fc-|_i is made of the components of x, & that correspond to the zero elements of e, and the components of 
[( x i 3 ,fe + F{ yL i 2 ,k ~ x i%k)] tna t correspond to the nonzero elements of e. If the assumption of mutually different 
indexes is dropped then map can be seen as a further variant of the basic DE strategy. 
In compact matrix form for the entire population, map ([7} can be written as: 

X fc+1 = J fc X fc (10) 

with the i-th line of matrix e R™p°p xd j s point x^fe. 
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To be more precise, in the case Xj,fc + i D, then every component j violating the boundaries b/ and b M is 
projected back into D by picking the new value x(j)i t k+i — bi(j) + s(b u (j) — h(j)), where <; is taken from a 
random uniform distribution c; € U[0, 1]. The interest is now in the properties of map ([ToV We start by observing 
that if S(x iifc + u iik ,Xi,k) = 1 f(xi,k + Ui,k) < f(*i,k), for i = 1, ...,n pop , the global minimizer x g e £> is a 
fixed point for map (0 since every point x G D is such that /(x) > /(x 9 ). 

Then, let us assume that at every iteration fc we can find two connected subsets D/~ and Z)J of D such that 
/(x fc ) < /(x£),Vx fc e D k ,*i*x* k € -Dfc \ -Dfc, and let us also assume that P fe C L» fc while P k+1 C D| (recall that 
Pfe and Pfc+i denote the populations at iteration k and k + 1 respectively). If x; is the lowest local minimum in 
Dfc, then x; is a fixed point in Dk for map ©. In fact, every point generated by map (fT) must be in and 

/(x,)</(x),Vx6fl fc . 

Moreover, under the above assumptions the reciprocal distance of the individuals cannot grow indefinitely because 
of the map (0, therefore the map cannot be divergent. 

Finally, a matrix Xfc whose lines are n pop replications of a unique point x 6 D, is always a fixed point for the 
map (TfOl l. 

Now one can consider two variants of Differential Evolution: one in which index i\ can be equal to index but 
both are different from 13 and one in which index i\ can be equal to index 13 but both are different from i^. In 
these cases two interesting results can be proven. First of all we prove that if %\ can be equal to index then the 
population can collapse to a single point in D. 

THEOREM II. 1 If, for every k, i\ is equal to 12 with strictly positive probability and f(xi beat ,k) = /( x i,fc) ^ 
^■i be3t ,k — X-i^fi-e., the set of best points within the population is made up by a single point, multiple copies of 
which possibly exist), then the population collapses to a single point with probability 1 for k — > 00 under the effect 
of the discrete dynamical system (0 

Proof: If i\ is equal to 12 with strictly positive probability, and /(xi best fc) = /(x^fc) <^ Xi 6e3t .fc = x^.fc for every 
k, then map (0 at each iteration k can generate, with strictly positive probability, a displacement U;.fc = (xi 3 .fc— x^fc) 
for each member i, i — 1, . . . ,n pop . This happens if, for each i £ [1, . . . ,n pop ], the following event, whose 
probability is strictly positive, occurs 

e(s) = 1, s = l,...,d, h^i\=i2- 

h e {i ■ f(xi,k) = f{xi bast ,k)} (11) 

Then, at each iteration we have a strictly positive probability that the two or more individuals collapse into the 
single point Xi best ^k and for k — s> 00 the whole population collapses to a single point with probability one. ■ 
After a collapse, the population cannot progress further and needs to be restarted. It is important to evaluate the 
probability of a total collapse, in fact if the collapse is progressive the population can keep on exploring but if 
the collapse is instantaneous the evolution ceases. Let us analyze the worst case in which, 23 = ibest- Then if for 
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all the individuals i\ = i 2 and e = 1 we have a total instantaneous collapse. The probability of having i\ = i 2 
is 1/ripop and the probability of having n pop — 1 individuals collapsing at the same iteration k is (l/n pop ) npop ~ 1 . 
Furthermore, given a Cr ^ 1 the probability to have e = 1 is Cjf 1 . Thus the total probability of an instantaneous 
total collapse is [(l/«pop)C^ 1 ]" pop_1 . The event has positive but small probability to happen. The complementary 
probability is 1 — [(1 /n p0 p)G"^ 1 } npop ~ 1 and the probability to have at least one total collapse after kh generations 
is 1 — {1 — [(1 /n pop )C'^ 1 } npop ^ 1 } kh . Therefore, allowing the indexes to assume the value i\ — i 2 introduces the 
following interesting property. If the population is stagnating, and the condition /(x i()est) / s ) = /(xi jfc ) x ibcst _ k = 
x i fc holds true, eventually there will be a total collapse and the population can be restarted with no risk to interrupt 
the evolutionary process. 

If ii is equal to « 3 with strictly positive probability but both are always different from i 2 , then convergence to a 
fixed point can be guaranteed if the function / is strictly quasi-convex [13] in D, and D is compact and convex. In 
other words, under the given assumptions, the population will converge to a single point. We immediately remark 
that such point is not necessarily a local minimizer of the problem. 

LEMMA II. 2 Let f be a continuous and strictly quasi-convex function on a set D and let us assume that D is 
compact, convex and is not a singleton. Then, the following minimization problem with F G (0, 1) has a strictly 
positive minimum value S r (e) for e small enough: 

S r (e)= min g(y lt y 2 ) = /(y 2 ) - f(F Yl + (1 - F)y 2 ) 

s± yi ,y a sD (i2) 

||yi - || > e 
/(yi) < /(y 2 ) 

Proof: 

Since / is strictly quasi-convex g(yi, y 2 ) > 0, Vyi, y 2 G D; furthermore, the feasible region is nonempty (if e 
is small enough and D is not a singleton) and compact. Therefore, according to Weierstrass' theorem the function 
g attains its minimum value over the feasible region. If we denote by (yj , y^ ) a global minimum point of the 
problem, then we have 

S r (e) = g(yly* 2 ) > o. (13) 



THEOREM II. 3 Assume that index i\ can be equal to Given a function f that is strictly quasi-convex over 
the compact and convex set D, and a population P k G D, then if F G (0, 1) and S{~x.i.k + Uj^x^fc) = 1 <S=> 
/(x^fc + Uj fc) < /(x^fc), for i = 1, n p0 p , the population Pk converges to a single point in D for k — > oo with 
probability one. 

Proof: By contradiction let us assume that we do not have convergence to a fixed point. Then, it must hold 
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that: 

inf max{||x ijfe -Xj ife ||,i,j <E [1, n po p]} > e > (14) 

k 

At every generation k the map can generate with a strictly positive probability, a displacement F{x.i* _ k — x j*,fc) 
for Xj.^, where i* and j* identify the individuals with the maximal reciprocal distance, such that the candidate 
point is x ccm( i = Fx^ fe + (1 — F)x.j- , k with /(xi«.fe) < f(xj*, k ). Since the function / is strictly quasi-convex, 
the candidate point is certainly better than xj* _ k and, therefore, is accepted by S. Now, in view of Eq. ( fT4b and of 
Lemma ( 111.21 ) we must have that, 

f{*cand) < /(Xj.fc) - £ r (e) (15) 

Such reduction will occur with probability one infinitely often, and consequently the function value of at least one 
individual will be, with probability one, infinitely often reduced by S r (e). But in this way the value of the objective 
function of such individual would diverge to — oo, which is a contradiction because / is bounded from below over 
the compact set D. ■ 
If a local minimum satisfies some regularity assumptions (e.g., the Hessian at the local minimum is definite positive), 
then we can always define a neighborhood such that: (i) map (0 will be unable to accept points outside the 
neighborhood; (ii) the function is strictly convex within the neighborhood. Therefore, if at some iteration k the 
population P k belongs to such a neighborhood, we can guarantee that map (TTOb will certainly converge to a fixed 
point made up by n pop replications of a single point belonging to the neighborhood. For general functions, we 
can not always guarantee that the population will converge to a fixed point, but we can show that the maximum 
difference between the objective function values in the population converges to 0, i.e. the points in the population 
tend to belong to the same level set. The proof is closely related to that of Theorem II. 1 and we still need to assume 
that indices i\ and can be equal. 

THEOREM II. 4 Assume that index i\ can be equal to ii. Given a function f, limited from below over D, and a 
population P k G D, then if F G (0,1) and S(-x itk +u^ k ,x^ k ) = 1 <^> /(x^fc +Uj,fc) < /(xj,fc), fori = l,...,n pop 
, the following holds 

max | /(xj.fc) - /(x 4 . fe ) |-> 0, (16) 

i,je[l,—,n p „ p ] 

as k — > oo with probability one. 

Proof: Let denote the set of best points in population P k , i.e. 

S k = { x j",fe : /( x i,fc) < f( x i,k) V i G [1, . . . , ripop] } (17) 

At each iteration k there is a strictly positive probability that the whole population will be reduced to at the 
next iteration. To show this it is enough to substitute the condition stated in ( ITTb with the following condition 

*3 e {i ■ Xi,k G SI} 
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(basically, with respect to Theorem II. 1, we only drop the requirement that at each iteration the best value of the 
population is attained at a single point). 

In other words, there is a strictly positive probability for the event that the population at a given iteration will be 
made up of points all with the same objective function value. Therefore, such an event will occur infinitely often 
with probability one. Let us denote with {kh]h=i.... the infinite subsequence of iterations at which the event is 
true, and let 

A ft = /(x a J - /( Xi , fch+1 ) (18) 

be the difference in the objective function values at two consecutive iterations kh and kh+i (note that, since at 
iterations kh, h — 1,... the objective function values are all equal and any i can be employed in the above 
definition). It holds that for all i,j 6 [1, . . ., n pop ] 

| /( Xj - fc ) - / (x j)fe ) | < A h Vfce [k h , k h+1 ] (19) 

Therefore, if we are able to prove that — > 0, as h — > oo, then we can also prove the result of our theorem. Let 
us assume by contradiction that A/j -/> 0. Then, there will exist a <5 > such that Ah > 5 infinitely many times. 
But this would lead to function values diverging to — oo and, consequently, to a contradiction. ■ 
As a consequence of these results, for the choice of the index i\, ii and i% non-mutually different, a possible 
stopping criterion for the dynamics in Eq. (|7) would be to stop when the difference between the function values 
in the population drops below a given threshold. However, this could cause a premature halt of the evolutionary 
process. Indeed, even if at some iteration the function value at all points of the population is equal, this does not 
necessarily mean that the algorithm will be unable to make further progress (unless all points in the population are 
multiple copies of a unique point). Therefore, since the evolution definitely ceases when the population contracts to 
a single point, we can alternatively use as a stopping criterion the fact that the maximum distance between points 
in the population drops below a given threshold. 

It is important to observe that the contraction of the population does not depend on whether the function / is 
minimized or maximized but depends only on the definition of S and on whether the function is bounded or 
unbounded. 

To further verify the contraction properties of the dynamics in Eq. © one can look at the eigenvalues of the matrix 
Jfe. 

If the population cannot diverge, the eigenvalues cannot have a norm always > 1. Furthermore, according to 
Theorem [TO] if the function / is strictly quasi-convex in D, the population converges to a single point in D, which 
implies that the map © is a contraction in D and therefore the eigenvalues should have a norm on average lower 
than 1. This can be illustrated with the following test: Consider a population of 8 individuals and a D enclosing 
the minimum of a paraboloid with the minimum at the origin. For a Cr=1.0 and F=0.8, we compute, for each step 
k, the distances of the closest and farthest individuals from the local minimum and the eigenvalues of the matrix 
J. Fig. Q] shows the behavior of the eigenvalues and of the distance from the origin. From the figure, we can see 
that for all iterations, the value of the norm of all the eigenvalues is in the interval [0, 1] except for one eigenvalue 
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Fig. 1 . Contraction of map 

ED: a) max and min distance of the individuals in the population from the origin, b) eigenvalues with the number 

of evolutionary iterations. 



at iteration 12. However, since every expansion is not accepted by the selection function S and for each iteration 
a number of eigenvalues have modulus lower than 1, the population contracts as represented in Fig. [TJa)- 
If multiple minima are contained in D, then it can be experimentally verified that the population contracts to a 
number of clusters initially converging to a number of local minima and eventually to the lowest among all the 
identified local minima. 

The local convergence properties of map (0 suggest its hybridization with the heuristic implemented in Monotonic 
Basin Hopping (MBH). 

MBH, first introduced in [14] in the context of molecular conformation problems) is a simple but effective global 
optimization method. At each iteration MBH: (i) generates a sample point within a neighborhood of size 2A of the 
current local minimum (e.g., by a random displacement of each coordinate of the current local minimum); (ii) starts 
a local search from the newly generated sample point; (iii) moves to the newly detected local minimum only if its 
function value is better than the function value at the current local minimum. The initial local minimum is usually 
randomly generated within the feasible region. Moreover, if no improvement is observed for a predetermined number 
of sample points n samp / es , a restart mechanism might be activated. The neighborhood of the local minimum x; is 
defined as [x/ — A,x; + A] d . A proper definition of A is essential for the performance of MBH: too small a size 
would not allow MBH to escape from the current local minimum, while too large a value would make the search 
degenerate in a completely random one. The local search performed at each iteration can be viewed as a dynamical 
system where the evolution of the systems at each iteration is controlled by some map. Under suitable assumptions, 
the systems converge to a fixed point. For instance, if / is convex and C 2 in a small enough domain containing a 
local minimum which satisfies some regularity conditions, Newton's map converges quadratically to a single fixed 
point (the local minimum) within the domain. This observation leads to the above mentioned hybridization of DE 
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with MBH: the dynamical system corresponding to a local search is replaced by the one corresponding to DE. 
More precisely, if map 0, either for a cluster P su b C P k or for the entire population P k , contracts, we can define 
a bubble D\ C D, around the best point within the cluster Xf, est , and re-initialize a subpopulation P su b in Di. Such 
operation is performed as soon as the maximum distance pa — max(||xi — Xj ||) among the elements in the cluster 
collapses below a value tol conv pA,max, where pA,max is the maximum pa recorded during the convergence of the 
map (fTOl l. 

In order to speed up convergence, the best solution x;, esf of the cluster is refined through a local search started at it, 
leading to a local minimum x;, which is saved in an archive A g . The bubble is defined, similarly to the neighborhood 
of MBH, as Di = [x; — A x; + A] d . The overall process leads to Algorithm Q] Note that the contraction of the 
population given, for example, by the metric pa, is a stopping criterion that does not depend explicitly on the value 
of the objective function but on the contractive properties of the map in Eq. @. Some remarks follow. 

1. Convergence of DE to a single point can not always be guaranteed, and consequently, we can not always 
guarantee that the contraction condition pa < tol conv pA.max will be satisfied at some iteration. Therefore, in 
order to take into account this possibility, we need to introduce a further stopping criterion for DE, such as 
a maximum number of iterations. We point out, however, that such alternative stopping criterion has never 
become active in our experiments. 

2. Even when DE converges to a single point, this is not guaranteed to be a local minimum. For this reason we 
always refine the best observed solution through a local search. 

3. If the search space is characterized by a single funnel structure [15], the restart of the population in the bubble 
allows the algorithm to move towards the global optimum by progressively jumping from one minimum to a 
better one. On the other hand, if multiple funnels or multiple isolated minima exist, a simple restart of the 
population inside a bubble might not be sufficient to avoid stagnation. A way to overcome this problem is 
to use global re-sampling: when the value of the best solution does not change for a predefined number of 
iterations iun max , the population is restarted. The restart procedure collects the solutions in the archive into 
n c clusters with baricenter x c j for j = 1, n c , then each agent x^ of the new population is generated so that 
||xi — x C)J -|| > 5 C . Note that a restart mechanism for DE was previously proposed also by Peng et al. [16] in a 
variant of the adaptive Differential Evolution JADE. However, in the work of Peng et al. the restart criterion 
and restart strategy are substantially different from the ones proposed here. For example, although we record 
the local minima in an external archive, we do not prevent the algorithm from searching in the surroundings 
of the recorded minima. On the contrary, we combine a local restart in a bubble surrounding the final point 
returned by DE, according to the heuristics of MBH, with a more global restart sampling outside the bubbles. 
A complete review of the existing variants of Differential Evolution can found in the work of Neri et al. 
[17]. Other restart mechanisms have recently been adopted to improve other evolutionary algorithms such as 
G-CMAES [18] or hybrid methods [19], but also in these cases the restart criterion and restart strategy are 
substantially different from the ones proposed here. 
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4. A question, mainly of theoretical interest, is about convergence. General results stating conditions under which 
convergence to the global minimum (with probability 1) is guaranteed can be found, e.g., in [20] and [21]. 
Such conditions are not fulfilled by standard DE, while IDEA fulfills them if parameter 6 C employed in the 
restart mechanism is "small enough" (if too large, some portions of the feasible region, possibly including the 
global minimum, might remain unexplored). We point out that, although in practice we are not interested in 
the behavior of an algorithm over an infinite time horizon, global convergence justifies re-running the search 
process with an increased number of function evaluations should the results be unsatisfactory. As we will show 
in the remainder of this paper, IDEA produces performance steadily increasing with the number of function 
evaluations without the need to change its settings. 



The modified differential evolution algorithm derived in Section [TT] is applied to the solution of four real world 
cases. The four cases are all multigravity assist (MGA) trajectory design problems, three of them with deep space 
manoeuvres (DSM) and one with no DSM's. In this section we describe the trajectory model and we formulate the 
global optimization problem that will be tackled through Algorithm Q] 

A. Trajectory Model with no DSM's 

Multi-gravity assist transfers with no deep space maneuvers can be modeled with a sequence of conic arcs connecting 
a number of planets. The first one is the departure planet, the last one is the destination planet and at all the 
intermediate ones the spacecraft performs a gravity assist maneuver. Given Np planets Pi with i = 1, Np — 1, 
each conic arc is computed as the solution of a Lambert's problem [22] given the departure time from planet 
Pi and the arrival time at planet Pj+i. The solution of the Lambert's problems yields the required incoming and 
outgoing velocities at a swing-by planet vi n and v rout (see Fig. [2j. The swing-by is modeled through a linked-conic 
approximation with powered maneuvers [23], i.e., the mismatch between the required outgoing velocity v rou t and 
the achievable outgoing velocity v aou t is compensated through a Av maneuver at the pericenter of the gravity assist 
hyperbola. The whole trajectory is completely defined by the departure time to and the transfer time for each leg 
T it with i = 1, ...,N P - 1. 

The normalized radius of the pericenter r p ^ of each swing-by hyperbola is derived a posteriori once each powered 
swing-by manoeuvre is computed. Thus, a constraint on each pericenter radius has to be introduced during the 
search for an optimal solution. In order to take into account this constraint, the objective function is augmented 
with the weighted violation of the constraints: 



III. Trajectory Model and Problem Formulation 



JVp-2 



N p -2 




(20) 



for a solution vector: 



x = 



[to, T\, T 2 , 7at p _i] t 



(21) 
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B. Trajectory Model with DSM's 

A general MGA-DSM trajectory can be modeled through a sequence of Np — 1 legs connecting Np celestial 
bodies (Fig. [3]) [24]. In particular if all celestial bodies are planets, each leg begins and ends with an encounter 
with a planet. Each leg i is made of two conic arcs: the first, propagated analytically forward in time, ends where 
the second solution of a Lambert's problem begins. The two arcs have a discontinuity in the absolute heliocentric 
velocity at their matching point M. Each DSM is computed as the vector difference between the velocities along 
the two conic arcs at the matching point. Given the transfer time Tj and the variable cti £ [0, 1] relative to each 
leg i, the matching point is at time tp>SM,i = tf,i-i + ^i^i, where f/,i_i is the final time of the leg i — 1. The 
relative velocity vector vo at the departure planet can be a design parameter and is expressed as: 

vo = vq [sin 6 cos (?, sin 5 sin 8, cos 8] T (22) 

with the angles 8 and 8 respectively representing the declination and the right ascension with respect to a local 
reference frame with the x axis aligned with the velocity vector of the planet, the z axis normal to orbital plane of 
the planet and the y axis completing the coordinate frame. This choice allows for an easy constraint on the escape 
velocity and asymptote direction while adding the possibility of having a deep space maneuver in the first arc after 
the launch. This is often the case when the escape velocity must be fixed due to the launcher capability or to the 
requirement of a resonant swing-by of the Earth (Earth-Earth transfers). In order to have a uniform distribution of 
random points on the surface of the sphere defining all the possible launch directions, the following transformation 
has been applied: 

g= ± ^cos(W2) + l 
2vr 2 

It results that the sphere surface is uniformly sampled when a uniform distribution of points for 8,8 £ [0,1] is 
chosen. Once the heliocentric velocity at the beginning of leg i, which can be the result of a swing-by maneuver or 
the asymptotic velocity after launch, is computed, the trajectory is analytically propagated until time tnsM,i- The 
second arc of leg i is then solved through a Lambert's algorithm, from Mi, the Cartesian position of the deep space 
maneuver, to Pj, the position of the target planet of phase i, for a time of flight (1 — a,)Tj. Two subsequent legs 
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are then joined together with a gravity assist manoeuvre. The effect of the gravity of a planet is to instantaneously 
change the velocity vector of the spacecraft. 

The relative incoming velocity vector and the outgoing velocity vector at the planet swing-by have the same modulus 
but different directions; therefore the heliocentric outgoing velocity results to be different from the heliocentric 
incoming one. In the linked conic model, the spacecraft is assumed to follow a hyperbolic trajectory with respect 
to the swing-by planet. The angular difference between the incoming relative velocity and the outgoing one v Q 
depends on the modulus of the incoming velocity and on the pericenter radius r^. Both the relative incoming and 
outgoing velocities belong to the plane of the hyperbola. However, in the linked-conic approximation, the maneuver 
is assumed to occur at the planet, where the planet is a point mass coinciding with its center of mass. Therefore, 
given the incoming velocity vector, one angle is required to define the attitude of the plane of the hyperbola n. 
Although there are different possible choices for the attitude angle 7, the one proposed in Ref. 7 has been adopted 
(see Fig. @), where 7 is the angle between the vector nn, normal to the hyperbola plane n, and the reference vector 
n r , normal to the plane containing the incoming relative velocity and the velocity of the planet vp. 
Given the number of legs of the trajectory Nl = Np — 1, the complete solution vector for this model is: 

x =[v ,6, S,t ,ai,Ti,ji,r Pt i,a 2 ,T 2 , (24) 

7ij Tj_i, Ct!,_i, 7N L -l,r Pt N L -i,aN L , T Nl ] 

where to is the departure date. Now, the design of a multi-gravity assist transfer can be transcribed into a general 
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nonlinear programming problem, with simple box constraints, of the form: 

min/(x) (25) 

One of the appealing aspects of this formulation is its solvability through a general global search method for box 
constrained problems. Depending on the kind of problem under study, the objective function can be defined in 
different ways. Here we choose to focus on minimizing the total Av of the mission, therefore the objective function 
/(x) is: 

N p 

f{x) = vo + Y j Av i + Av f (26) 

where Au, is the velocity change due to the DSM in the i-th leg, and Av/ is the maneuver needed to inject the 
spacecraft into the final orbit. 

IV. Test Problems 

We consider a benchmark made of four different test-cases: two versions of the MGA transfer from the Earth to 
Saturn of the Cassini-Huygens mission, a multi-gravity assist transfer to the comet 67P/Churyumov-Gerasimenko 
(similar to the Rosetta mission), and a multi-gravity assist rendezvous transfer with mid-course manoeuvres to 
Mercury (similar to the Messenger mission). Algorithm [T] called IDEA, is applied to the solution of the four cases 
and compared to standard Differential Evolution [5] and Monotonic Basin Hopping [14], [25]. 

A. Cassini with no DSM's 

The first test case is a multi gravity assist trajectory from the Earth to Saturn following the sequence Earth- Venus- 
Venus-Earth-Jupiter-Saturn (EVVEJS). There are six planets and the transfer is modeled as in Section IIII-AI thus 
the solution vector is: 

X = [t ,r 1 ,T 2 ,r 3 ,r 4 ,T 5 ] T (27) 

The final Avf is the Av needed to inject the spacecraft into an ideal operative orbit around Saturn with a pericenter 
radius of 108950 km and an eccentricity of 0.98. The weighting functions Wi are defined as follows: 

Wi = 0.005[1 - sign(r p ,i - r pmirM )], i = 1, ...,3 (28) 
Wi = 0.0005[1 - sign(r Pi4 - r pminA )] 

with the minimum normalized pericenter radii r pm i n ,i = 1.0496, r pm j nj 2 = 1.0496, r prn i n ,3 = 1.0627 and 
f pm in,4 = 9.3925. For this case the dimensionality of the problem is six, with the search space D defined by 
the following intervals: t G [-1000, 0]MJD2000, T x e [30,400]d, T 2 e [100,470]d, T 3 e [30,400]d, T 4 £ 
[400, 2000]d, T 5 G [1000, 6000]d. The best known solution is f best = 4.9312 km/s, with x best =[-789.75443770458, 
158.301628961437, 449.385882183958, 54.7050296906556, 1024.5997453164, 4552.72068790619] T . 
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B. Cassini with DSM's 

The second test case is again a multi gravity assist trajectory from the Earth to Saturn following the sequence Earth- 
Venus-Venus-Earth-Jupiter-Saturn (EVVEJS), but a deep space manoeuvre is allowed along the transfer arc from 
one planet to the other according to the model presented in Section UlI-BI Although from a trajectory design point 
of view, this problem is similar to the first test case, the model is substantially different and therefore represents 
a different problem from a global optimization point of view. Since the transcription of the same problem into 
different mathematical models can affect the search for the global optimum, it is interesting to analyze the behavior 
of the same set of global optimization algorithms applied to two different transcriptions of the same trajectory 
design problem. 

Here, Avf is defined as the modulus of the vector difference between the velocity of Saturn at arrival and the 
velocity of the spacecraft at the same time. For this case the dimensionality of the problem is 22, with the search 
space D is defined by the following intervals: t G [-1000, 0]MJD2000, v Q G [3,5]km/s, 9 G [0,1], 6 G [0,1], 
Ti G [100,400]d, T 2 G [100,500]d, T 3 G [30,300]d, T 4 G [400, 1600]d, T 5 G [800,2200]d, a x G [0.01,0.9], 
a 2 G [0.01,0.9], a 3 G [0.01,0.9], a 4 G [0.01,0.9], a 5 G [0.01,0.9], r Pjl G [1.05,6], r pa G [1.05,6], r p , 3 G 
[1.15,6.5], r pA G [1.7,291], 71 G [0,2vr], 72 G [0,2tt], 73 G [0,2tt], 74 G [0,2tt]. The best known solution is 
f best = 8.3889 km/s, x 6est =[-780.917853635368, 3.27536879103551, 0.782513100225235, 0.378682006044345, 
169.131920055057,424.13242396494, 53.296452710059, 2199.98648654574, 0.795774035295027,0.530055267286, 
0. 126002760289258, 0.0105947672634,0.0381505843013, 1.35556902792788, 1.05001740672886,1.30699201995999, 
71.3749247783128, 3.15842153037544, 3.53046280721895, 3.12561791754698, 3.08422162979462] T . 

C. Rosetta Mission 

The third test case is a multi gravity assist trajectory from the Earth to the comet 67P/Churyumov-Gerasimenko 
following the gravity assist sequence that was planned for the spacecraft Rosetta: Earth-Earth-Mars-Earth-Earth- 
Comet. The trajectory model is the one described in Section IIII-BI but the objective function does not include 

Vq. 

For this case the dimensionality of the problem is 22, with the search space D is defined by the following intervals: 
t G [1460, 1825]MJD2000, v„ G [3,5]km/s, 9 G [0,1], 6 G [0,1], T 4 e [300,500]d, T 2 e [150,800]d, T 3 G 
[150,800]d, T 4 G [300,800]d, T 5 e [700, 1850]d, a a G [0.01,0.9], a 2 G [0.01,0.9], a 3 G [0.01,0.9], a 4 G 
[0.01,0.9], a 5 G [0.01,0.9], r pA G [1.05,9], r Pj2 G [1.05,9], r p>3 G [1.05,9], r pA e [1.05,9], 71 G [0,2tt], 
72 G [-7T, vr], 73 G [0, 2vr], 74 G [0, 2tt]. 

The best known solution is f best = 1.34229 km/s, with solution vector x best =[1542.65536672006, 4.48068107888312, 
0.935220667497966, 0.9909562486258, 365.24235847396, 707.540858648698, 257.417859715383, 730.483434305258, 
1850, 0.310501108489873,0.809061227121068,0.0124756484551758,0.0466967002704,0.43701236871638,1.8286351998512, 
1.05, 2.8097351 1169638, 1.18798981835459, 2.61660601734377,-0.215250274241349, 3.579503141 15394, 3. 46467471 264343 ] T . 
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D. Messenger Mission 

The fourth problem is a multi-gravity assist trajectory from the Earth to planet Mercury following the sequence of 
planetary encounters of the first part of the Messenger mission. As for the previous test case, the trajectory model 
is the one described in Section UlI-BI 

For this case the dimensionality of the problem is 18, with the search space D is defined by the following intervals: 
t Q G [1000,4000]MJD2000, v Q G [l,5]km/s, 9 G [0,1], 8 G [0,1], T x G [200,400]d, T 2 G [30,400]d, T 3 e 
[30,400]d, T 4 G [30,400]d, a x G [0.01,0.99], a 2 € [0.01,0.99], a 3 G [0.01,0.99], a A € [0.01,0.99], r pA € [1.1,6], 
r p ,2 G [1.1.6], r p<3 G [1.1,6], 7i G [— 7T,7r], 72 £ [-7T,7r], 73 G [-7T,7r]. 

The best known solution is f best = 8.631 km/s, with solution vector x 6est =[l 171. 14619813253, 1.41951376601752, 
0.628043728560056, 0.500000255697689, 399.999999999969, 178.9214691 1 1868, 299.279691870106, 180.6891 14497891, 
0.236414009949924,0.0674215615945254,0.832992171208578,0.312514378885353,1.7435422021558,3.03087330660319, 
1.10000000000119, 0.219820823285448, 0.477475660779879, 0.2258981 17795826] T . 

Note that, the search space for each one of the trajectory models is normalized so that D is a unit hypercube with 
each component of the solution vector belonging to the interval [0, 1]. Furthermore, for all cases, solution algorithms 
were run for a progressively increasing number of function evaluations from 100000 to 1.25 million. 

V. Testing Procedure 

The modified DE algorithm will be compared against standard DE and MBH following a rigorous testing procedure. 
A detailed description of the testing procedure can be found in [26] and it is here summarized in Algorithm [2] for 
a generic solution algorithm A and a generic problem p. Here 5t(A, i) denotes the best point observed during the 
z-th run of algorithm A. 

The index of performance j s is the number of successes of the algorithm A. In the following we use the f\,est 
values reported above in place of f global and we consider only 8f(x.(A,i)) and not 8 x (x.(A,i)). 
The success rate, p s = j s /n, will be used for the comparative assessment of the algorithm performance instead of 
the commonly used best value, mean and variance. Indeed, the distribution of the function values is not Gaussian. 
Therefore, the average value can be far away from the results returned with a higher frequency from a given 
algorithm. In the same way, the variance is not a good indicator of the quality of the algorithm because a high 
variance together with a high mean value can correspond to the case in which 50% of the results are close to the 
global optimum with the other 50% far from it. Finally, statistical tests, such as the t-test, that assume a Gaussian 
distribution of the sample can not be applied to correctly predict the behavior of an algorithm. Instead, the success 
rate gives an immediate and unique indication of the algorithm effectiveness, and, moreover, it can be always 
represented with a binomial probability density function (pdf), independent of the number of function evaluations, 
the problem and the type of optimization algorithm. 

A key point is setting properly the value of n to have a reliable estimate of the success probability of an algorithm, 
or success rate p s = j s /n. Since the success is binomial (assumes values that are either or 1) we can set a priori 
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the value of n to get the required confidence on the value of p s . A commonly adopted starting point for sizing the 
sample of a binomial distribution is to assume a normal approximation for the sample proportion p s of successes 
(i.e. p s ~ N{6p,6 p (l — 6 p )/n}, where 6 P is the unknown true proportion of successes) and the requirement that 
Pr[|p s — 6 P \ < d err \9 p ] is at least equal to 1 — a p [27]. This leads to the expression n > 6 P (1 — & P )xfi) a ld\ rr 
that can be approximated conservatively with 

n > 0.25 X 2 (1)<ap /dl rr (29) 

valid for 6 P — 0.5. According to Eq. (1291 1. an error < 0.05 (d err = 0.05) with a 95% of confidence (a p = 0.05) 
would require at least n — 175. If n is extended to 1000, the error reduces to 0.020857. In the following, we 
will use n = 200 and N = 1.25e6 for tuning the algorithms, and n = 1000 with variable N to compare their 
performance. In fact, a reduced error is required to discriminate between the performance of two algorithms at low 
N. 

The values of tolf for the four test cases are: tolf — 0.0688 km/s for Cassini with no DSM's, tolf = 0.1111 km/s 
for Cassini with DSM's, tolf — 0.05778 km/s for Rosetta and tolf — 0.05 km/s for Messenger. The choice of 
these thresholds was dictated by the need to discriminate among different minima. At the same time they represent 
a reasonable margin on the total Av. In fact, during standard preliminary designs, a margin between 3% and 5% is 
typically added for contingencies, while here all the selected thresholds are below 5% of the value of the objective 
function. In other words, all the minima within the thresholds would be indistinguishable from a mission design 
point of view. 

A. Parameter Tuning 

Prior to running the tests on all the four cases, the key parameters for DE and MBH were tuned to get the best 
performance. We used the Rosetta case as a tuning example. The tuning of DE and MBH is used also to tune 
IDEA. The tuning of MBH was relatively fast as there are only two parameters: the size of the neighborhood and 
the number of samples before global restart. 

Figure [5] shows the performance of MBH on the Rosetta case for different values of A and n sam pi es . It can be 
noted that the performance tends to increase for a number of samples that tend to infinity. On the other hand, 
there is a peak of performance around n samp i es = 30. In the remainder of this paper, we will call MBH-GR the 
version of MBH implementing a global restart after 30 unsuccessful samples and we call MBH, the version with 
n S ampies = oo. The optimum A seems to be 0.1, therefore this value was used for all the test cases. 
Note that the general trend of the performance of MBH does not change for the other cases and therefore the 
settings seem to be of general validity for this benchmark of test problems. 

For the tuning of DE, we instead considered a grid of values for F, Cr and population size, for different strategies. 
Figs. |6] to [9] show the most significant results. From these figures one can deduce that strategy DE/best/l/bin with 
F in the range [0.8, 1] and a population below 200 individuals would be a good choice. Alternatively strategy 
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DE/rand/l/bin can be used with a smaller population and an F in the range [0.6, 0.7]. In both cases Cr is better 
at around 0.8 as the problem is not separable and the components of the solution vector are interdependent. 
The combination of strategy and values of F, n pop controls the speed of local convergence to a fixed point of the 
algorithm. In the following tests, therefore, we decided to use strategy DE/best/l/bin with two sets of populations, 
[5d, 10 d\, where d is the dimensionality of the problem, with single values of step-size and crossover probability 
F = 0.75 and Cr = 0.8 respectively. The two settings will be denoted with DE5c, DElOc. The trends in Figs. [6] 
to [9] can be registered also for the other two test cases, although the performance of DE is much poorer than for 
Rosetta, therefore it can be argued that the settings have general validity. These settings are also in line with the 
theory developed by Zaharie in [28]. 

The settings of IDEA were derived from the individual tuning of DE and MBH. In particular, we took a value 
Cr — 0.9, as the variables are not decoupled, a convergence strategy for the choice of the indexes in Eq. 
and a value F = 0.9 together with a small population to have a fast convergence but without losing exploration 
capabilities. We used an n pop = 20 for Cassini with no DSM's and for Messenger, while an n pop — 40 for Rosetta 
and Cassini with DSM's. For all the test cases 5 C — 0.1, tol conv = 0.25, while A = 0.2 for all the cases except 
for Messenger for which we used 0.25 instead. The parameter controlling the maximum number of local restarts 
is iun m ax — 6 for Messenger, iun max = 2 for Rosetta, iun max — +oo for Cassini with and without DSM's as no 
guided restart is applied. 

B. Test Results 

The performance of all the algorithms are summarized in Figs. [10] QT| Q~2] and [13] For the Cassini case with no 
DSM's, with results shown in Fig. [To] IDEA performs exceptionally well compared to all the other algorithms 
providing a success rate over 50% at 200000 function evaluations. Both versions of DE exhaust their exploration 
capabilities quite soon and an increase of the number of function evaluations does not help as the population has 
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DE on the Rosetta test case. C =0.8 Strategy=DE/best/l/bin 
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Fig. 6. Performance of Differential Evolution on the Rosetta test case, with strategy DE/best/l/bin and Cr=0.& 

DE on the Rosetta test case. C R =0.S Strategy=DE/rand71/bin 

0.4, 
0.3- 



oT 0.2- 




Fig. 7. Performance of Differential Evolution on the Rosetta test case, with strategy DE/rand/l/bin and Cjj=0.8 



collapsed to a fixed point. The performance is not nearly as good on the Cassini case with DSM's, as seen in Fig. 

LTD 

Although IDEA is still better than the other algorithms, and in particular than standard DE, it is comparable to MBH 
up to 600000 function evaluations and achieves a moderate 30% as best result at 1.25 million function evaluations. 
This poor performance seems to be due mainly to problems of local convergence. Note, however, how the general 
trend suggests that IDEA is not stagnating as the success rate is steadily increasing for an increasing number of 
function evaluations. On the contrary DE seems to reach a flat plateau. Even for the Rosetta case, IDEA displays 
exceptionally good results with DE5c, DElOc, MBH and MBH-GR substantially equivalent until N < 800fc. After 
that, DE reaches a plateau and stops exploring. Both IDEA and MBH, instead, show a positive monotonic trend 
until N = 1.25M, but IDEA outperforms both versions of MBH. 

In the Messenger case, all algorithms do not perform particularly well if N < 500k. Due to the structure of the 
problem, IDEA needs more time to converge under the tol conv threshold and only after many re-sampling iterations 
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DE on the Rosetta test case, n =1 10 Strate2y=DE/best/l/bin 
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Fig. 8. Performance of Differential Evolution on the Rosetta test case, with strategy DE/best/l/bin and n pov -\\0 



DE on the Rosetta test case, n =110 Strateey=DE/rand/l/bin 

pop 




Fig. 9. Performance of Differential Evolution on the Rosetta test case, with strategy DE/rand/l/bin and n vop -\\0 



is able to show good performance. On the other hand, all DE's do not display any significant improvement for 
more than 500fc evaluations and even MBH can be considered equally ineffective (the difference between DE and 
MBH is lower then the confidence interval). 

VI. Search Space Analysis 

The different behaviors of IDEA, DE and MBH on the four test cases can be understood with an analysis of the 
structure of the search space. The collection of the results from the tests can be used to deduce some properties 
of the problems within the benchmark and to predict the behavior of the solution algorithms. An understanding of 
the characteristics of the benchmark is required to generalize the result of the tests. In fact, every consideration on 
the performance of the algorithms is applicable only to problems with similar characteristics. 
All local minima found in all the tests by the applied global methods were grouped according to the value of their 
objective function. Specifically, the range of values of the objective function for each test was divided in a finite 
number of levels, with each group of minima associated to a particular level. 
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Fig. 10. Variation of the success rate with the number of function evaluations, for the Cassini without DSM's test case 



Cassini with DSM's 




Function evaluations , „5 

x 10 



Fig. 11. Variation of the success rate with the number of function evaluations, for the Cassini with DSM's test case 



Then, we computed the average value of the relative distance of each local minimum with respect to all other local 
minima within the same level du (or intra-level distance), and the average value of the relative distance of each 
local minimum with respect to all other local minima in the lower level du (or trans-level distance). The du for 
the lowest level is the average distance with respect to the best known solution. 

The values du and d t i give an immediate representation of the diversity of the local minima and the probability of a 
transition from one level to another. More precisely, a cluster of minima with a large intra-level distance and a small 
trans-level distance suggests an easy transition to lower values of the objective function and a possible underlying 
funnel structure [12]. In particular in the case of funnel structures, the values of d t i and du should progressively go 
to zero. A du that does not go to zero or clusters with different values of du, are the cue to a possible underlying 
multi-funnel structure. 
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Fig. 12. Variation of the success rate with the number of function evaluations, for the Rosetta test case 
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Fig. 13. Variation of the success rate with the number of function evaluations, for the Messenger test case 



Fig. [T4]provide two illustrative examples. Fig. [14(a)] represents a single funnel structure with five local minima x^i 
where i = 1, 5, and three levels. The intra-level distance at level 2, given by the distance c?24 = 2^,2 — x/,4, is 
lower than dis = X11 — Xi t §, the intra-level distance at level 1. The same is true for the trans-level distance at level 
2, d,23, which is lower than the trans-level distance at level 1, (d\2 + du)/2, for minimum x^\. 
Fig. 1 14(b)] instead, represents a bi-funnel structure. In this case, the minima around xi.§ have an average intra-level 
distance lower than xi } 2 but a trans-level distance c?63 much larger than 1^23. Thus, the two minima xi : 2 and Xi$ 
will appear on the d t i-du graph with different values of du and d t i- If the threshold of level 3 were increased above 
the objective value of xi$, then all minima of level 2 would have similar du, but the du at level 3 would not go 
to zero. 

This analysis method is an extension of the work of Reeves and Yamada [9], and is used to concisely visualize 
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Fig. 14. One dimensional example of a) single funnel structure and b) bi-funnel structure. 
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Fig. 15. Relative distance of the local minima for the Rastrigin function in 5 dimensions 



the distribution of the local minima. The definition of the levels depends on the groups of minima of interest, 
and can be derived from mission constraints or by an arbitrary subdivision of the range of values of the objective 
function. Different subdivisions reveal different characteristics of the search space but give only equivalent cues on 
the transition probability. 

As an example of application of the proposed search space analysis method, Figs.ll5land [T6l show the da-d t i plots 
for two well known functions: Rastrigin and Schwefel. In both cases the dimensionality is 5. The Rastrigin function 
is known to have a single funnel and to be globally convex (see the example of one-dimensional Rastrigin function 
in Fig. |17(a)) . The clusters in the du-d t i plane are aligned along the diagonal and converge to 0. In the case of the 
Schwefel function, which has no single funnel structure (see the example of one-dimensional Schwefel function in 
Fig. |17(b)[ ), the clusters are more scattered and both the da and d t i values tend to remain quite high. 
When applied to the benchmark of space trajectory problems, the analysis approach seems to suggest (see Fig. 1 18(a)) 
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Fig. 16. Relative distance of the local minima for the Schwefel function in 5 dimensions 




that the Cassini case with no DSM's has a structure similar to the one in Fig. 1 14(b)] the minima at level 5 belong to 
two distinct clusters with substantially different d t i and du. The clusters, corresponding to levels 1, 2 and 3, have 
values of d t i and du both lower than 0.2, which suggests an easy transition from one level to an another. Thus, 
below an objective function of 7.5 km/s there seems to be an underlying single funnel structure. Note that an easy 
transition among levels favors the search mechanism of MBH as demonstrated by the test results. 
Fig. |18(b)| shows that both dti an d du progressively tend to zero up to a certain point, after which da goes to 
zero while du remains almost unchanged. The figure suggests that, in the Cassini case with DSM, there is a single 
funnel structure for function values above 9.5 km/s, while below the minima are scattered and distant from each 
other. 

From Fig. [19] we can argue that Rosetta has a wide zone containing the global minimum together with a number of 
local minima with similar cost function. In fact the blue stars belonging to level 1 are distributed over values of du 
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in the range [0.0, 1.9] and du in the range [1.2, 1.6]. It should be noted that from a practical point of view, solutions 
with a difference in the total Av of less than 50 m/s are equivalent, especially in the preliminary design phase of 
a mission. Therefore, all the blue stars belonging to level 1 are potentially good candidates for a space mission. 
Note that, the solutions at level 1 are very far apart. Thus, a transition within this group and the identification of 
the global minimum can be problematic. 

In addition to the solutions at level 1, the search space of Rosetta presents two interesting groups at level 3 and level 
4. These two sets of local minima have a relatively low inter-distance du but are far from the global minimum, since 
du is quite high. They represent two strong attractors that explain why, for example, the actual Rosetta mission has 
a total Av a bit higher than 1.7 km/s. 

The Messenger mission problem presents a different structure, see Fig. [20] The structure of the search space of the 
Messenger test case appears to be characterized by many small basins separated from one another. Even the level 
under the threshold adopted to compute the success rate contains two distinct, yet similar, regions. 

VII. Conclusion 

In this paper we casted an evolutionary heuristic in the form of a discrete map. The discrete map can be seen as a 
variant of Differential Evolution. We then demonstrated empirically and theoretically that the map has a number of 
fixed points to which it converges asymptotically under some assumptions on the structure of the search space. This 
result suggested the development of an algorithm that outperforms Differential Evolution on some difficult space 
trajectory design problems. The novel algorithm displays a remarkable robustness, i.e., the ability to repeatedly 
converge to solutions with a value of the cost function close to the best known solution to date. Furthermore, it 
shows the desirable characteristic of increasing its performance with the number of function evaluations, reaching 
in some cases success rates which are up to around 25 times higher than the standard DE. These considerations 



January 20, 2013 



DRAFT 



JOURNAL OF KTpX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 



26 




2r 



1.5 



0.5 




* level- 


-1 


f <8.68 


C> level- 


-2 


f <8.75 


+ level- 


-3 


f<8.90 


< level- 


-4 


f<9.00 


o level- 


-5 


f<9.50 


level- 


-6 


f<10.00 


■ level- 


-7;f<12.50 



0.5 



1.5 



Fig. 20. Relative distance of the local minima for Messenger 



can be generalized to all problems with similar characteristics of the search space. 
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Algorithm 1 Inflationary Differential Evolution Algorithm (IDEA) 

1: Set values for n pop , Cr, F, iun max , and tol conv , set nf eva i — and k = 1 

2: Initialize x,^ and v^fc for all i G [1, n pop ] 

3: Create the vector of random values r e U[0, 1] and the mask e = r < Cr 

4: for all i e [1, ... 1 do 

5: Select three individuals x i± , x i2 , x i3 

6: Create the vector u iik = e[(x i3ifc - x l . k ) + F(x l2tk - x ilife )] 
7: v a+ i = (1 - c)v a + u a 
8: Compute 5 and zv 

9: X^ fe+ i = X^fc + SVv^fc+i 
10: Tlfeval = Hf eva l + 1 

ll: end for 

12: k = k + 1 

13: pa = max(||x a - Xj ife ||) for Vx a ,x j>/s e P su6 C P fc 

14: if pA < tol conv pA,max then 

15: Run a local optimizer ai from Xf, est and let x; be the local minimum found by ai 
16: if /(x ; ) < finest) then 

17: fbest <- /(Xj) 

18: end if 

19: if /(x best ) < / mm then 

20: /mm ffabest) 

21: iun = 

22: else 

23: ran = inn + 1 

24: end if 

25: if iun < iun max then 

26: Define a bubble such that Xb es t 6 A f° r Xb est g P su 6 and VP SU ;, C P k 

27: A g = A g + {x 6est } where x best = argmin; /(x i;fe ) 

28: Initialize x, ; fc and v»,fc for all G [1, n p0 p], in the bubble Di C D 

29: else 

30: Define clusters in the archive and compute the baricenter x CJ of each cluster with j = l,...,n c . 

31: Initialize x iyk and Vi ik for all i G [1, ...,n pop ], in D such that Vi,j, ||x, ; fc — x CJ || > S c 

32: end if 

33: end if 

34: Termination Unless n feva i > n feva i max , goto Step 3 
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Algorithm 2 Testing Procedure 

1: Set to N the max number of function evaluations for A 

2: Apply A to p for n times and set j s = 

3: for all i G [1, n] do 

4: Compute 5 f (5t(A,i)) =| / 9 ; ofcQ ; - f(5t(A,i)) |; and ^(x(A,i)) = \\x g i bai -5t(A,i)\\ 
5: if (<5/(x(A,i)) < tolf) A (S x (x(A,i)) < tol x ) then j s = j s + 1 
6: end if 

7: end for 
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